Dataset statistics
| Number of variables | 22 |
|---|---|
| Number of observations | 597659 |
| Missing cells | 4895030 |
| Missing cells (%) | 37.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 139.1 MiB |
| Average record size in memory | 244.0 B |
Variable types
| NUM | 12 |
|---|---|
| UNSUPPORTED | 7 |
| CAT | 3 |
operation_car has constant value "597659" | Constant |
operation_date has a high cardinality: 19127 distinct values | High cardinality |
operation_st_esr is highly correlated with destination_esr and 1 other fields | High correlation |
destination_esr is highly correlated with operation_st_esr and 1 other fields | High correlation |
ssp_station_esr is highly correlated with destination_esr and 1 other fields | High correlation |
ssp_station_id is highly correlated with operation_st_id | High correlation |
operation_st_id is highly correlated with ssp_station_id | High correlation |
length has 597659 (100.0%) missing values | Missing |
adm has 597659 (100.0%) missing values | Missing |
danger has 597659 (100.0%) missing values | Missing |
gruz has 597659 (100.0%) missing values | Missing |
receiver has 597659 (100.0%) missing values | Missing |
rod_train has 355400 (59.5%) missing values | Missing |
sender has 597659 (100.0%) missing values | Missing |
tare_weight has 597659 (100.0%) missing values | Missing |
weight_brutto has 355394 (59.5%) missing values | Missing |
df_index has unique values | Unique |
length is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
adm is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
danger is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
gruz is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
receiver is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
sender is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
tare_weight is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
| Analysis started | 2021-04-14 19:27:11.836499 |
|---|---|
| Analysis finished | 2021-04-14 19:28:27.661045 |
| Duration | 1 minute and 15.82 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 597659 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2134349.342 |
|---|---|
| Minimum | 9 |
| Maximum | 4189912 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 9 |
|---|---|
| 5-th percentile | 224820 |
| Q1 | 1089142.5 |
| median | 2135453 |
| Q3 | 3197786.5 |
| 95-th percentile | 4004879.5 |
| Maximum | 4189912 |
| Range | 4189903 |
| Interquartile range (IQR) | 2108644 |
Descriptive statistics
| Standard deviation | 1210381.82 |
|---|---|
| Coefficient of variation (CV) | 0.5670963964 |
| Kurtosis | -1.212187261 |
| Mean | 2134349.342 |
| Median Absolute Deviation (MAD) | 1054616 |
| Skewness | -0.01428844734 |
| Sum | 1.275613093e+12 |
| Variance | 1.465024151e+12 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 3149826 | 1 | < 0.1% | |
| 401809 | 1 | < 0.1% | |
| 2570674 | 1 | < 0.1% | |
| 2898338 | 1 | < 0.1% | |
| 3534801 | 1 | < 0.1% | |
| 1526204 | 1 | < 0.1% | |
| 3621309 | 1 | < 0.1% | |
| 3627454 | 1 | < 0.1% | |
| 1354176 | 1 | < 0.1% | |
| 2117380 | 1 | < 0.1% | |
| 297412 | 1 | < 0.1% | |
| 3197442 | 1 | < 0.1% | |
| 2419144 | 1 | < 0.1% | |
| 319945 | 1 | < 0.1% | |
| 2372964 | 1 | < 0.1% | |
| 2415054 | 1 | < 0.1% | |
| 3642793 | 1 | < 0.1% | |
| 3615152 | 1 | < 0.1% | |
| 1578945 | 1 | < 0.1% | |
| 2605485 | 1 | < 0.1% | |
| 1458589 | 1 | < 0.1% | |
| 2492820 | 1 | < 0.1% | |
| 393621 | 1 | < 0.1% | |
| 2496918 | 1 | < 0.1% | |
| 1446295 | 1 | < 0.1% | |
| Other values (597634) | 597634 | > 99.9% |
| Value | Count | Frequency (%) | |
| 9 | 1 | < 0.1% | |
| 17 | 1 | < 0.1% | |
| 25 | 1 | < 0.1% | |
| 26 | 1 | < 0.1% | |
| 34 | 1 | < 0.1% | |
| 38 | 1 | < 0.1% | |
| 43 | 1 | < 0.1% | |
| 64 | 1 | < 0.1% | |
| 70 | 1 | < 0.1% | |
| 73 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4189912 | 1 | < 0.1% | |
| 4189906 | 1 | < 0.1% | |
| 4189901 | 1 | < 0.1% | |
| 4189893 | 1 | < 0.1% | |
| 4189889 | 1 | < 0.1% | |
| 4189885 | 1 | < 0.1% | |
| 4189876 | 1 | < 0.1% | |
| 4189869 | 1 | < 0.1% | |
| 4189860 | 1 | < 0.1% | |
| 4189854 | 1 | < 0.1% |
index_train
Real number (ℝ≥0)
| Distinct | 25745 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.83944119e+14 |
|---|---|
| Minimum | 1.04001941e+11 |
| Maximum | 9.97502819e+14 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 1.04001941e+11 |
|---|---|
| 5-th percentile | 8.300032558e+14 |
| Q1 | 8.634019464e+14 |
| median | 8.931060029e+14 |
| Q3 | 9.37906577e+14 |
| 95-th percentile | 9.74407948e+14 |
| Maximum | 9.97502819e+14 |
| Range | 9.973988171e+14 |
| Interquartile range (IQR) | 7.450463061e+13 |
Descriptive statistics
| Standard deviation | 1.256464252e+14 |
|---|---|
| Coefficient of variation (CV) | 0.1421429505 |
| Kurtosis | 26.61177534 |
| Mean | 8.83944119e+14 |
| Median Absolute Deviation (MAD) | 3.399994807e+13 |
| Skewness | -4.868512912 |
| Sum | 5.282971582e+20 |
| Variance | 1.578702417e+28 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 8.302009469e+14 | 201 | < 0.1% | |
| 8.623059489e+14 | 192 | < 0.1% | |
| 9.70001942e+14 | 183 | < 0.1% | |
| 8.623050029e+14 | 182 | < 0.1% | |
| 8.302000199e+14 | 180 | < 0.1% | |
| 8.302000039e+14 | 176 | < 0.1% | |
| 8.302000049e+14 | 172 | < 0.1% | |
| 8.302009429e+14 | 172 | < 0.1% | |
| 8.302009499e+14 | 170 | < 0.1% | |
| 9.200029309e+14 | 166 | < 0.1% | |
| 8.302000129e+14 | 163 | < 0.1% | |
| 8.302009499e+14 | 163 | < 0.1% | |
| 8.302009449e+14 | 163 | < 0.1% | |
| 8.302000419e+14 | 162 | < 0.1% | |
| 9.70001943e+14 | 160 | < 0.1% | |
| 8.302000029e+14 | 159 | < 0.1% | |
| 8.302000329e+14 | 158 | < 0.1% | |
| 8.302009499e+14 | 156 | < 0.1% | |
| 8.302000029e+14 | 156 | < 0.1% | |
| 8.302000069e+14 | 146 | < 0.1% | |
| 8.623059509e+14 | 145 | < 0.1% | |
| 9.70001945e+14 | 140 | < 0.1% | |
| 8.623050049e+14 | 137 | < 0.1% | |
| 9.82808121e+14 | 136 | < 0.1% | |
| 8.838090309e+14 | 130 | < 0.1% | |
| Other values (25720) | 593591 | 99.3% |
| Value | Count | Frequency (%) | |
| 1.04001941e+11 | 1 | < 0.1% | |
| 1.049188862e+11 | 1 | < 0.1% | |
| 1.049208862e+11 | 1 | < 0.1% | |
| 1.090500295e+13 | 62 | < 0.1% | |
| 1.090500694e+13 | 31 | < 0.1% | |
| 1.090500794e+13 | 31 | < 0.1% | |
| 1.540003995e+13 | 13 | < 0.1% | |
| 1.540004597e+13 | 1 | < 0.1% | |
| 1.540005395e+13 | 2 | < 0.1% | |
| 1.600989986e+13 | 85 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9.97502819e+14 | 1 | < 0.1% | |
| 9.97502818e+14 | 3 | < 0.1% | |
| 9.97502817e+14 | 1 | < 0.1% | |
| 9.97502816e+14 | 9 | < 0.1% | |
| 9.97502625e+14 | 8 | < 0.1% | |
| 9.97502624e+14 | 7 | < 0.1% | |
| 9.97502622e+14 | 4 | < 0.1% | |
| 9.97502621e+14 | 6 | < 0.1% | |
| 9.9750262e+14 | 8 | < 0.1% | |
| 9.97502619e+14 | 2 | < 0.1% |
car_number
Real number (ℝ≥0)
| Distinct | 324088 |
|---|---|
| Distinct (%) | 54.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 59794537.31 |
|---|---|
| Minimum | 20023164 |
| Maximum | 98099997 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 20023164 |
|---|---|
| 5-th percentile | 34107803.4 |
| Q1 | 53182486 |
| median | 58361635 |
| Q3 | 63184881 |
| 95-th percentile | 94405771.2 |
| Maximum | 98099997 |
| Range | 78076833 |
| Interquartile range (IQR) | 10002395 |
Descriptive statistics
| Standard deviation | 14244038.57 |
|---|---|
| Coefficient of variation (CV) | 0.2382163859 |
| Kurtosis | 1.543241667 |
| Mean | 59794537.31 |
| Median Absolute Deviation (MAD) | 4933874 |
| Skewness | 0.8113420869 |
| Sum | 3.573674337e+13 |
| Variance | 2.028926349e+14 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 37827334 | 57 | < 0.1% | |
| 42189811 | 52 | < 0.1% | |
| 37846763 | 52 | < 0.1% | |
| 37827573 | 51 | < 0.1% | |
| 43807932 | 51 | < 0.1% | |
| 55864821 | 50 | < 0.1% | |
| 55822928 | 47 | < 0.1% | |
| 55927537 | 47 | < 0.1% | |
| 55626428 | 45 | < 0.1% | |
| 55864227 | 44 | < 0.1% | |
| 55701130 | 43 | < 0.1% | |
| 55822944 | 42 | < 0.1% | |
| 32020406 | 42 | < 0.1% | |
| 55997936 | 37 | < 0.1% | |
| 44606663 | 36 | < 0.1% | |
| 55924526 | 36 | < 0.1% | |
| 55952550 | 36 | < 0.1% | |
| 32020257 | 36 | < 0.1% | |
| 55864714 | 35 | < 0.1% | |
| 34164483 | 35 | < 0.1% | |
| 55851810 | 35 | < 0.1% | |
| 55701205 | 34 | < 0.1% | |
| 55839914 | 34 | < 0.1% | |
| 55864862 | 34 | < 0.1% | |
| 55918833 | 34 | < 0.1% | |
| Other values (324063) | 596614 | 99.8% |
| Value | Count | Frequency (%) | |
| 20023164 | 1 | < 0.1% | |
| 21082482 | 1 | < 0.1% | |
| 21094370 | 2 | < 0.1% | |
| 21116231 | 1 | < 0.1% | |
| 21125471 | 1 | < 0.1% | |
| 21132048 | 1 | < 0.1% | |
| 21136163 | 6 | < 0.1% | |
| 21136429 | 4 | < 0.1% | |
| 21136445 | 1 | < 0.1% | |
| 21138474 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 98099997 | 1 | < 0.1% | |
| 98099989 | 1 | < 0.1% | |
| 98099971 | 1 | < 0.1% | |
| 98099963 | 1 | < 0.1% | |
| 98099955 | 1 | < 0.1% | |
| 98099948 | 1 | < 0.1% | |
| 98099930 | 1 | < 0.1% | |
| 98099922 | 1 | < 0.1% | |
| 98099914 | 1 | < 0.1% | |
| 98099906 | 1 | < 0.1% |
| Distinct | 712 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 123 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 918759.5164 |
|---|---|
| Minimum | 830003 |
| Maximum | 998100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 830003 |
|---|---|
| 5-th percentile | 841604 |
| Q1 | 871906 |
| median | 925701 |
| Q3 | 967808 |
| 95-th percentile | 986103 |
| Maximum | 998100 |
| Range | 168097 |
| Interquartile range (IQR) | 95902 |
Descriptive statistics
| Standard deviation | 48499.17883 |
|---|---|
| Coefficient of variation (CV) | 0.05278767508 |
| Kurtosis | -1.297601752 |
| Mean | 918759.5164 |
| Median Absolute Deviation (MAD) | 44293 |
| Skewness | -0.07081747804 |
| Sum | 5.489918864e+11 |
| Variance | 2352170347 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 986103 | 37520 | 6.3% | |
| 967808 | 24193 | 4.0% | |
| 864207 | 11985 | 2.0% | |
| 937906 | 11233 | 1.9% | |
| 985702 | 10912 | 1.8% | |
| 932207 | 10487 | 1.8% | |
| 863007 | 10085 | 1.7% | |
| 984700 | 9839 | 1.6% | |
| 831504 | 9551 | 1.6% | |
| 946801 | 9356 | 1.6% | |
| 887904 | 8611 | 1.4% | |
| 887603 | 8476 | 1.4% | |
| 883809 | 8026 | 1.3% | |
| 980200 | 7048 | 1.2% | |
| 987801 | 6823 | 1.1% | |
| 893106 | 6682 | 1.1% | |
| 881408 | 6443 | 1.1% | |
| 925701 | 6034 | 1.0% | |
| 864902 | 5993 | 1.0% | |
| 970001 | 5989 | 1.0% | |
| 862108 | 5982 | 1.0% | |
| 860206 | 5726 | 1.0% | |
| 862201 | 5599 | 0.9% | |
| 970406 | 5473 | 0.9% | |
| 892103 | 5435 | 0.9% | |
| Other values (687) | 354035 | 59.2% |
| Value | Count | Frequency (%) | |
| 830003 | 809 | 0.1% | |
| 830107 | 960 | 0.2% | |
| 830200 | 447 | 0.1% | |
| 830304 | 1068 | 0.2% | |
| 830709 | 798 | 0.1% | |
| 831203 | 1067 | 0.2% | |
| 831400 | 2496 | 0.4% | |
| 831504 | 9551 | 1.6% | |
| 831608 | 170 | < 0.1% | |
| 831805 | 115 | < 0.1% |
| Value | Count | Frequency (%) | |
| 998100 | 69 | < 0.1% | |
| 997502 | 46 | < 0.1% | |
| 997409 | 4 | < 0.1% | |
| 997108 | 10 | < 0.1% | |
| 996904 | 144 | < 0.1% | |
| 996800 | 1 | < 0.1% | |
| 996603 | 24 | < 0.1% | |
| 996302 | 46 | < 0.1% | |
| 995808 | 19 | < 0.1% | |
| 995507 | 95 | < 0.1% |
loaded
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.6 MiB |
| 1 | |
|---|---|
| 2 |
| Value | Count | Frequency (%) | |
| 1 | 327021 | 54.7% | |
| 2 | 270638 | 45.3% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| . | 597659 | 33.3% | |
| 0 | 597659 | 33.3% | |
| 1 | 327021 | 18.2% | |
| 2 | 270638 | 15.1% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 1195318 | 66.7% | |
| Other Punctuation | 597659 | 33.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 597659 | 50.0% | |
| 1 | 327021 | 27.4% | |
| 2 | 270638 | 22.6% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 597659 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 1792977 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| . | 597659 | 33.3% | |
| 0 | 597659 | 33.3% | |
| 1 | 327021 | 18.2% | |
| 2 | 270638 | 15.1% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 1792977 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| . | 597659 | 33.3% | |
| 0 | 597659 | 33.3% | |
| 1 | 327021 | 18.2% | |
| 2 | 270638 | 15.1% |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.6 MiB |
| 3 |
|---|
| Value | Count | Frequency (%) | |
| 3 | 597659 | 100.0% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 3 | 597659 | 33.3% | |
| . | 597659 | 33.3% | |
| 0 | 597659 | 33.3% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 1195318 | 66.7% | |
| Other Punctuation | 597659 | 33.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 3 | 597659 | 50.0% | |
| 0 | 597659 | 50.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 597659 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 1792977 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 3 | 597659 | 33.3% | |
| . | 597659 | 33.3% | |
| 0 | 597659 | 33.3% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 1792977 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 3 | 597659 | 33.3% | |
| . | 597659 | 33.3% | |
| 0 | 597659 | 33.3% |
| Distinct | 19127 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.6 MiB |
| 2020-07-28 18:01:00 | 272 |
|---|---|
| 2020-07-28 17:40:00 | 265 |
| 2020-07-15 20:00:00 | 261 |
| 2020-07-18 09:30:00 | 259 |
| 2020-07-20 06:30:00 | 249 |
| Other values (19122) |
| Value | Count | Frequency (%) | |
| 2020-07-28 18:01:00 | 272 | < 0.1% | |
| 2020-07-28 17:40:00 | 265 | < 0.1% | |
| 2020-07-15 20:00:00 | 261 | < 0.1% | |
| 2020-07-18 09:30:00 | 259 | < 0.1% | |
| 2020-07-20 06:30:00 | 249 | < 0.1% | |
| 2020-07-18 06:40:00 | 248 | < 0.1% | |
| 2020-07-29 18:01:00 | 241 | < 0.1% | |
| 2020-07-17 08:00:00 | 240 | < 0.1% | |
| 2020-07-17 19:00:00 | 239 | < 0.1% | |
| 2020-07-18 19:00:00 | 236 | < 0.1% | |
| 2020-07-30 13:41:00 | 235 | < 0.1% | |
| 2020-07-26 13:41:00 | 234 | < 0.1% | |
| 2020-07-27 14:20:00 | 228 | < 0.1% | |
| 2020-07-20 08:02:00 | 224 | < 0.1% | |
| 2020-07-23 06:10:00 | 220 | < 0.1% | |
| 2020-07-17 05:30:00 | 219 | < 0.1% | |
| 2020-07-10 23:30:00 | 216 | < 0.1% | |
| 2020-07-17 16:10:00 | 216 | < 0.1% | |
| 2020-07-18 12:00:00 | 216 | < 0.1% | |
| 2020-07-29 00:20:00 | 216 | < 0.1% | |
| 2020-07-30 04:00:00 | 215 | < 0.1% | |
| 2020-07-29 20:30:00 | 214 | < 0.1% | |
| 2020-07-25 07:15:00 | 213 | < 0.1% | |
| 2020-07-20 14:20:00 | 212 | < 0.1% | |
| 2020-07-18 18:01:00 | 210 | < 0.1% | |
| Other values (19102) | 591861 | 99.0% |
Frequencies of value counts
Unique
| Unique | 2329 ? |
|---|---|
| Unique (%) | 0.4% |
Histogram of lengths of the category
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 0 | 3673907 | 32.4% | |
| 2 | 1838725 | 16.2% | |
| - | 1195318 | 10.5% | |
| : | 1195318 | 10.5% | |
| 7 | 774836 | 6.8% | |
| 1 | 734254 | 6.5% | |
| 597659 | 5.3% | ||
| 3 | 299061 | 2.6% | |
| 5 | 293183 | 2.6% | |
| 4 | 231481 | 2.0% | |
| 8 | 178961 | 1.6% | |
| 6 | 174829 | 1.5% | |
| 9 | 167989 | 1.5% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 8367226 | 73.7% | |
| Dash Punctuation | 1195318 | 10.5% | |
| Other Punctuation | 1195318 | 10.5% | |
| Space Separator | 597659 | 5.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 3673907 | 43.9% | |
| 2 | 1838725 | 22.0% | |
| 7 | 774836 | 9.3% | |
| 1 | 734254 | 8.8% | |
| 3 | 299061 | 3.6% | |
| 5 | 293183 | 3.5% | |
| 4 | 231481 | 2.8% | |
| 8 | 178961 | 2.1% | |
| 6 | 174829 | 2.1% | |
| 9 | 167989 | 2.0% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 1195318 | 100.0% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 597659 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| : | 1195318 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 11355521 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 0 | 3673907 | 32.4% | |
| 2 | 1838725 | 16.2% | |
| - | 1195318 | 10.5% | |
| : | 1195318 | 10.5% | |
| 7 | 774836 | 6.8% | |
| 1 | 734254 | 6.5% | |
| 597659 | 5.3% | ||
| 3 | 299061 | 2.6% | |
| 5 | 293183 | 2.6% | |
| 4 | 231481 | 2.0% | |
| 8 | 178961 | 1.6% | |
| 6 | 174829 | 1.5% | |
| 9 | 167989 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 11355521 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 0 | 3673907 | 32.4% | |
| 2 | 1838725 | 16.2% | |
| - | 1195318 | 10.5% | |
| : | 1195318 | 10.5% | |
| 7 | 774836 | 6.8% | |
| 1 | 734254 | 6.5% | |
| 597659 | 5.3% | ||
| 3 | 299061 | 2.6% | |
| 5 | 293183 | 2.6% | |
| 4 | 231481 | 2.0% | |
| 8 | 178961 | 1.6% | |
| 6 | 174829 | 1.5% | |
| 9 | 167989 | 1.5% |
| Distinct | 690 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 123 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 918726.0711 |
|---|---|
| Minimum | 830003 |
| Maximum | 998100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 830003 |
|---|---|
| 5-th percentile | 841604 |
| Q1 | 871906 |
| median | 925701 |
| Q3 | 967600 |
| 95-th percentile | 985906 |
| Maximum | 998100 |
| Range | 168097 |
| Interquartile range (IQR) | 95694 |
Descriptive statistics
| Standard deviation | 48460.41996 |
|---|---|
| Coefficient of variation (CV) | 0.05274740914 |
| Kurtosis | -1.297470179 |
| Mean | 918726.0711 |
| Median Absolute Deviation (MAD) | 44293 |
| Skewness | -0.07187732784 |
| Sum | 5.489719016e+11 |
| Variance | 2348412303 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 985906 | 39261 | 6.6% | |
| 967600 | 25821 | 4.3% | |
| 864207 | 11985 | 2.0% | |
| 937906 | 11233 | 1.9% | |
| 946801 | 11210 | 1.9% | |
| 985609 | 11037 | 1.8% | |
| 980003 | 10676 | 1.8% | |
| 932207 | 10487 | 1.8% | |
| 984502 | 10463 | 1.8% | |
| 863007 | 10085 | 1.7% | |
| 831504 | 9551 | 1.6% | |
| 887904 | 8611 | 1.4% | |
| 887603 | 8476 | 1.4% | |
| 883809 | 8026 | 1.3% | |
| 936903 | 7614 | 1.3% | |
| 987708 | 6840 | 1.1% | |
| 893106 | 6682 | 1.1% | |
| 881408 | 6443 | 1.1% | |
| 925701 | 6034 | 1.0% | |
| 864902 | 5993 | 1.0% | |
| 970001 | 5989 | 1.0% | |
| 862108 | 5982 | 1.0% | |
| 860206 | 5726 | 1.0% | |
| 970406 | 5473 | 0.9% | |
| 892103 | 5435 | 0.9% | |
| Other values (665) | 342403 | 57.3% |
| Value | Count | Frequency (%) | |
| 830003 | 809 | 0.1% | |
| 830107 | 960 | 0.2% | |
| 830200 | 447 | 0.1% | |
| 830304 | 1068 | 0.2% | |
| 830709 | 798 | 0.1% | |
| 831203 | 1067 | 0.2% | |
| 831400 | 2496 | 0.4% | |
| 831504 | 9551 | 1.6% | |
| 831608 | 170 | < 0.1% | |
| 831805 | 115 | < 0.1% |
| Value | Count | Frequency (%) | |
| 998100 | 69 | < 0.1% | |
| 997502 | 46 | < 0.1% | |
| 997409 | 4 | < 0.1% | |
| 997108 | 10 | < 0.1% | |
| 996904 | 144 | < 0.1% | |
| 996800 | 1 | < 0.1% | |
| 996603 | 24 | < 0.1% | |
| 996302 | 46 | < 0.1% | |
| 995808 | 19 | < 0.1% | |
| 995507 | 95 | < 0.1% |
| Distinct | 690 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 123 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2000531568 |
|---|---|
| Minimum | 2000035070 |
| Maximum | 2002025667 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 2000035070 |
|---|---|
| 5-th percentile | 2000035530 |
| Q1 | 2000036950 |
| median | 2000038600 |
| Q3 | 2001930530 |
| 95-th percentile | 2001933470 |
| Maximum | 2002025667 |
| Range | 1990597 |
| Interquartile range (IQR) | 1893580 |
Descriptive statistics
| Standard deviation | 831630.3764 |
|---|---|
| Coefficient of variation (CV) | 0.0004157047007 |
| Kurtosis | -0.8133419193 |
| Mean | 2000531568 |
| Median Absolute Deviation (MAD) | 1712 |
| Skewness | 1.089323257 |
| Sum | 1.195389631e+15 |
| Variance | 6.91609083e+11 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2000038976 | 39261 | 6.6% | |
| 2000038600 | 25821 | 4.3% | |
| 2001930816 | 11985 | 2.0% | |
| 2000037532 | 11233 | 1.9% | |
| 2000037862 | 11210 | 1.9% | |
| 2000038970 | 11037 | 1.8% | |
| 2000038840 | 10676 | 1.8% | |
| 2000037064 | 10487 | 1.8% | |
| 2000038950 | 10463 | 1.8% | |
| 2001933494 | 10085 | 1.7% | |
| 2001930534 | 9551 | 1.6% | |
| 2000035564 | 8611 | 1.4% | |
| 2000035530 | 8476 | 1.4% | |
| 2000035252 | 8026 | 1.3% | |
| 2000037498 | 7614 | 1.3% | |
| 2000039016 | 6840 | 1.1% | |
| 2000035966 | 6682 | 1.1% | |
| 2000035194 | 6443 | 1.1% | |
| 2000036868 | 6034 | 1.0% | |
| 2000039908 | 5993 | 1.0% | |
| 2000038624 | 5989 | 1.0% | |
| 2001930794 | 5982 | 1.0% | |
| 2001930760 | 5726 | 1.0% | |
| 2000038634 | 5473 | 0.9% | |
| 2000035890 | 5435 | 0.9% | |
| Other values (665) | 342403 | 57.3% |
| Value | Count | Frequency (%) | |
| 2000035070 | 2 | < 0.1% | |
| 2000035090 | 20 | < 0.1% | |
| 2000035110 | 310 | 0.1% | |
| 2000035130 | 111 | < 0.1% | |
| 2000035140 | 313 | 0.1% | |
| 2000035162 | 466 | 0.1% | |
| 2000035176 | 2 | < 0.1% | |
| 2000035182 | 532 | 0.1% | |
| 2000035194 | 6443 | 1.1% | |
| 2000035212 | 68 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2002025667 | 32 | < 0.1% | |
| 2002023867 | 134 | < 0.1% | |
| 2002023505 | 21 | < 0.1% | |
| 2002023503 | 14 | < 0.1% | |
| 2001933538 | 626 | 0.1% | |
| 2001933536 | 1388 | 0.2% | |
| 2001933530 | 1546 | 0.3% | |
| 2001933522 | 3002 | 0.5% | |
| 2001933520 | 82 | < 0.1% | |
| 2001933518 | 2099 | 0.4% |
operation_train
Real number (ℝ≥0)
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.28559477 |
|---|---|
| Minimum | 4 |
| Maximum | 72 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 4 |
| median | 4 |
| Q3 | 4 |
| 95-th percentile | 72 |
| Maximum | 72 |
| Range | 68 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 19.43096262 |
|---|---|
| Coefficient of variation (CV) | 1.889143316 |
| Kurtosis | 5.946123616 |
| Mean | 10.28559477 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.804980394 |
| Sum | 6147268 |
| Variance | 377.5623081 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=5)
| Value | Count | Frequency (%) | |
| 4 | 540141 | 90.4% | |
| 72 | 51977 | 8.7% | |
| 44 | 5509 | 0.9% | |
| 64 | 29 | < 0.1% | |
| 54 | 2 | < 0.1% | |
| (Missing) | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4 | 540141 | 90.4% | |
| 44 | 5509 | 0.9% | |
| 54 | 2 | < 0.1% | |
| 64 | 29 | < 0.1% | |
| 72 | 51977 | 8.7% |
| Value | Count | Frequency (%) | |
| 72 | 51977 | 8.7% | |
| 64 | 29 | < 0.1% | |
| 54 | 2 | < 0.1% | |
| 44 | 5509 | 0.9% | |
| 4 | 540141 | 90.4% |
rodvag
Real number (ℝ≥0)
| Distinct | 11 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 64.93859207 |
|---|---|
| Minimum | 20 |
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 40 |
| Q1 | 60 |
| median | 60 |
| Q3 | 70 |
| 95-th percentile | 96 |
| Maximum | 99 |
| Range | 79 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 16.68273849 |
|---|---|
| Coefficient of variation (CV) | 0.2569002184 |
| Kurtosis | 0.7991182928 |
| Mean | 64.93859207 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.03813747555 |
| Sum | 38811134 |
| Variance | 278.3137635 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=11)
| Value | Count | Frequency (%) | |
| 60 | 352969 | 59.1% | |
| 70 | 73720 | 12.3% | |
| 96 | 51398 | 8.6% | |
| 90 | 49909 | 8.4% | |
| 40 | 35545 | 5.9% | |
| 20 | 21155 | 3.5% | |
| 93 | 4773 | 0.8% | |
| 95 | 4383 | 0.7% | |
| 92 | 1993 | 0.3% | |
| 87 | 1795 | 0.3% | |
| 99 | 19 | < 0.1% |
| Value | Count | Frequency (%) | |
| 20 | 21155 | 3.5% | |
| 40 | 35545 | 5.9% | |
| 60 | 352969 | 59.1% | |
| 70 | 73720 | 12.3% | |
| 87 | 1795 | 0.3% | |
| 90 | 49909 | 8.4% | |
| 92 | 1993 | 0.3% | |
| 93 | 4773 | 0.8% | |
| 95 | 4383 | 0.7% | |
| 96 | 51398 | 8.6% |
| Value | Count | Frequency (%) | |
| 99 | 19 | < 0.1% | |
| 96 | 51398 | 8.6% | |
| 95 | 4383 | 0.7% | |
| 93 | 4773 | 0.8% | |
| 92 | 1993 | 0.3% | |
| 90 | 49909 | 8.4% | |
| 87 | 1795 | 0.3% | |
| 70 | 73720 | 12.3% | |
| 60 | 352969 | 59.1% | |
| 40 | 35545 | 5.9% |
| Distinct | 22 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 355400 |
| Missing (%) | 59.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.05333135 |
|---|---|
| Minimum | 3 |
| Maximum | 89 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 10 |
| Q1 | 10 |
| median | 50 |
| Q3 | 52 |
| 95-th percentile | 83 |
| Maximum | 89 |
| Range | 86 |
| Interquartile range (IQR) | 42 |
Descriptive statistics
| Standard deviation | 23.78135046 |
|---|---|
| Coefficient of variation (CV) | 0.6249479248 |
| Kurtosis | -1.18198489 |
| Mean | 38.05333135 |
| Median Absolute Deviation (MAD) | 20 |
| Skewness | 0.1221588189 |
| Sum | 9218762 |
| Variance | 565.5526297 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) | |
| 10 | 82896 | 13.9% | |
| 50 | 37122 | 6.2% | |
| 52 | 34057 | 5.7% | |
| 40 | 16313 | 2.7% | |
| 63 | 14567 | 2.4% | |
| 83 | 12848 | 2.1% | |
| 30 | 9930 | 1.7% | |
| 20 | 9134 | 1.5% | |
| 55 | 9028 | 1.5% | |
| 72 | 8182 | 1.4% | |
| 58 | 4098 | 0.7% | |
| 81 | 1771 | 0.3% | |
| 89 | 1123 | 0.2% | |
| 56 | 519 | 0.1% | |
| 87 | 166 | < 0.1% | |
| 88 | 163 | < 0.1% | |
| 57 | 132 | < 0.1% | |
| 82 | 121 | < 0.1% | |
| 53 | 38 | < 0.1% | |
| 3 | 29 | < 0.1% | |
| 64 | 21 | < 0.1% | |
| 66 | 1 | < 0.1% | |
| (Missing) | 355400 | 59.5% |
| Value | Count | Frequency (%) | |
| 3 | 29 | < 0.1% | |
| 10 | 82896 | 13.9% | |
| 20 | 9134 | 1.5% | |
| 30 | 9930 | 1.7% | |
| 40 | 16313 | 2.7% | |
| 50 | 37122 | 6.2% | |
| 52 | 34057 | 5.7% | |
| 53 | 38 | < 0.1% | |
| 55 | 9028 | 1.5% | |
| 56 | 519 | 0.1% |
| Value | Count | Frequency (%) | |
| 89 | 1123 | 0.2% | |
| 88 | 163 | < 0.1% | |
| 87 | 166 | < 0.1% | |
| 83 | 12848 | 2.1% | |
| 82 | 121 | < 0.1% | |
| 81 | 1771 | 0.3% | |
| 72 | 8182 | 1.4% | |
| 66 | 1 | < 0.1% | |
| 64 | 21 | < 0.1% | |
| 63 | 14567 | 2.4% |
| Distinct | 721 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 918555.0539 |
|---|---|
| Minimum | 104 |
| Maximum | 998100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 104 |
|---|---|
| 5-th percentile | 841402 |
| Q1 | 872504 |
| median | 925701 |
| Q3 | 967600 |
| 95-th percentile | 985906 |
| Maximum | 998100 |
| Range | 997996 |
| Interquartile range (IQR) | 95096 |
Descriptive statistics
| Standard deviation | 50208.59089 |
|---|---|
| Coefficient of variation (CV) | 0.0546604046 |
| Kurtosis | 20.86284099 |
| Mean | 918555.0539 |
| Median Absolute Deviation (MAD) | 44293 |
| Skewness | -1.286791911 |
| Sum | 5.489826949e+11 |
| Variance | 2520902599 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 985906 | 39261 | 6.6% | |
| 967600 | 25838 | 4.3% | |
| 864207 | 11986 | 2.0% | |
| 937906 | 11302 | 1.9% | |
| 946801 | 11227 | 1.9% | |
| 985609 | 11037 | 1.8% | |
| 980003 | 10862 | 1.8% | |
| 932207 | 10550 | 1.8% | |
| 984502 | 10463 | 1.8% | |
| 863007 | 10087 | 1.7% | |
| 831504 | 9551 | 1.6% | |
| 887904 | 8968 | 1.5% | |
| 887603 | 8497 | 1.4% | |
| 883809 | 8026 | 1.3% | |
| 936903 | 7641 | 1.3% | |
| 987708 | 6840 | 1.1% | |
| 893106 | 6682 | 1.1% | |
| 970001 | 6460 | 1.1% | |
| 881408 | 6443 | 1.1% | |
| 925701 | 6073 | 1.0% | |
| 864902 | 6042 | 1.0% | |
| 862108 | 6000 | 1.0% | |
| 860206 | 5742 | 1.0% | |
| 892103 | 5475 | 0.9% | |
| 970406 | 5473 | 0.9% | |
| Other values (696) | 341133 | 57.1% |
| Value | Count | Frequency (%) | |
| 104 | 2 | < 0.1% | |
| 706 | 4 | < 0.1% | |
| 1107 | 20 | < 0.1% | |
| 1200 | 53 | < 0.1% | |
| 1802 | 35 | < 0.1% | |
| 60001 | 7 | < 0.1% | |
| 183502 | 1 | < 0.1% | |
| 230600 | 1 | < 0.1% | |
| 820001 | 186 | < 0.1% | |
| 830003 | 1059 | 0.2% |
| Value | Count | Frequency (%) | |
| 998100 | 69 | < 0.1% | |
| 997502 | 55 | < 0.1% | |
| 997409 | 4 | < 0.1% | |
| 997108 | 1 | < 0.1% | |
| 996904 | 140 | < 0.1% | |
| 996800 | 2 | < 0.1% | |
| 996603 | 23 | < 0.1% | |
| 996302 | 50 | < 0.1% | |
| 995808 | 19 | < 0.1% | |
| 995507 | 95 | < 0.1% |
| Distinct | 711 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 253 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2000532784 |
|---|---|
| Minimum | 2000035070 |
| Maximum | 2002030161 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 2000035070 |
|---|---|
| 5-th percentile | 2000035530 |
| Q1 | 2000036954 |
| median | 2000038600 |
| Q3 | 2001930530 |
| 95-th percentile | 2001933470 |
| Maximum | 2002030161 |
| Range | 1995091 |
| Interquartile range (IQR) | 1893576 |
Descriptive statistics
| Standard deviation | 832350.2392 |
|---|---|
| Coefficient of variation (CV) | 0.0004160642835 |
| Kurtosis | -0.8211407721 |
| Mean | 2000532784 |
| Median Absolute Deviation (MAD) | 1712 |
| Skewness | 1.08571745 |
| Sum | 1.195130288e+15 |
| Variance | 6.928069207e+11 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2000038976 | 39261 | 6.6% | |
| 2000038600 | 25838 | 4.3% | |
| 2001930816 | 11986 | 2.0% | |
| 2000037532 | 11302 | 1.9% | |
| 2000037862 | 11227 | 1.9% | |
| 2000038970 | 11037 | 1.8% | |
| 2000038840 | 10862 | 1.8% | |
| 2000037064 | 10550 | 1.8% | |
| 2000038950 | 10463 | 1.8% | |
| 2001933494 | 10087 | 1.7% | |
| 2001930534 | 9551 | 1.6% | |
| 2000035564 | 8968 | 1.5% | |
| 2000035530 | 8497 | 1.4% | |
| 2000035252 | 8026 | 1.3% | |
| 2000037498 | 7641 | 1.3% | |
| 2000039016 | 6840 | 1.1% | |
| 2000035966 | 6682 | 1.1% | |
| 2000038624 | 6460 | 1.1% | |
| 2000035194 | 6443 | 1.1% | |
| 2000036868 | 6073 | 1.0% | |
| 2000039908 | 6042 | 1.0% | |
| 2001930794 | 6000 | 1.0% | |
| 2001930760 | 5742 | 1.0% | |
| 2000035890 | 5475 | 0.9% | |
| 2000038634 | 5473 | 0.9% | |
| Other values (686) | 340880 | 57.0% |
| Value | Count | Frequency (%) | |
| 2000035070 | 1 | < 0.1% | |
| 2000035090 | 20 | < 0.1% | |
| 2000035110 | 581 | 0.1% | |
| 2000035130 | 278 | < 0.1% | |
| 2000035140 | 200 | < 0.1% | |
| 2000035162 | 335 | 0.1% | |
| 2000035176 | 2 | < 0.1% | |
| 2000035182 | 516 | 0.1% | |
| 2000035194 | 6443 | 1.1% | |
| 2000035212 | 33 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2002030161 | 1 | < 0.1% | |
| 2002030159 | 4 | < 0.1% | |
| 2002030157 | 1 | < 0.1% | |
| 2002026609 | 2 | < 0.1% | |
| 2002025757 | 1 | < 0.1% | |
| 2002025683 | 1 | < 0.1% | |
| 2002025669 | 3 | < 0.1% | |
| 2002025667 | 32 | < 0.1% | |
| 2002025661 | 37 | < 0.1% | |
| 2002025657 | 6 | < 0.1% |
| Distinct | 4706 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 355394 |
| Missing (%) | 59.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3116.659988 |
|---|---|
| Minimum | 0 |
| Maximum | 8962 |
| Zeros | 1061 |
| Zeros (%) | 0.2% |
| Memory size | 4.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 433 |
| Q1 | 1532 |
| median | 2399 |
| Q3 | 5374 |
| 95-th percentile | 6291 |
| Maximum | 8962 |
| Range | 8962 |
| Interquartile range (IQR) | 3842 |
Descriptive statistics
| Standard deviation | 2038.150955 |
|---|---|
| Coefficient of variation (CV) | 0.6539535794 |
| Kurtosis | -1.21746278 |
| Mean | 3116.659988 |
| Median Absolute Deviation (MAD) | 1324 |
| Skewness | 0.4401580793 |
| Sum | 755057632 |
| Variance | 4154059.315 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 1061 | 0.2% | |
| 6278 | 717 | 0.1% | |
| 1697 | 696 | 0.1% | |
| 6277 | 643 | 0.1% | |
| 1696 | 640 | 0.1% | |
| 1170 | 602 | 0.1% | |
| 6301 | 561 | 0.1% | |
| 1679 | 557 | 0.1% | |
| 1702 | 555 | 0.1% | |
| 1699 | 533 | 0.1% | |
| 1680 | 523 | 0.1% | |
| 6269 | 494 | 0.1% | |
| 6279 | 488 | 0.1% | |
| 6267 | 478 | 0.1% | |
| 6276 | 454 | 0.1% | |
| 6259 | 440 | 0.1% | |
| 1717 | 434 | 0.1% | |
| 6262 | 428 | 0.1% | |
| 1681 | 426 | 0.1% | |
| 6274 | 415 | 0.1% | |
| 6300 | 415 | 0.1% | |
| 6282 | 409 | 0.1% | |
| 1692 | 409 | 0.1% | |
| 1706 | 403 | 0.1% | |
| 1008 | 401 | 0.1% | |
| Other values (4681) | 229083 | 38.3% | |
| (Missing) | 355394 | 59.5% |
| Value | Count | Frequency (%) | |
| 0 | 1061 | 0.2% | |
| 19 | 1 | < 0.1% | |
| 21 | 90 | < 0.1% | |
| 22 | 39 | < 0.1% | |
| 23 | 22 | < 0.1% | |
| 24 | 157 | < 0.1% | |
| 25 | 56 | < 0.1% | |
| 26 | 49 | < 0.1% | |
| 27 | 24 | < 0.1% | |
| 28 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 8962 | 1 | < 0.1% | |
| 7678 | 82 | < 0.1% | |
| 7198 | 38 | < 0.1% | |
| 7108 | 12 | < 0.1% | |
| 7067 | 71 | < 0.1% | |
| 7066 | 71 | < 0.1% | |
| 7065 | 71 | < 0.1% | |
| 7063 | 71 | < 0.1% | |
| 7062 | 71 | < 0.1% | |
| 7061 | 91 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | index_train | length | car_number | destination_esr | adm | danger | gruz | loaded | operation_car | operation_date | operation_st_esr | operation_st_id | operation_train | receiver | rodvag | rod_train | sender | ssp_station_esr | ssp_station_id | tare_weight | weight_brutto | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 9 | 9.100008e+14 | NaN | 62845730 | 967808.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 00:16:00 | 967600.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | 10.0 | NaN | 967600.0 | 2.000039e+09 | NaN | 7042.0 |
| 1 | 17 | 9.700017e+14 | NaN | 62845078 | 872701.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-15 19:56:00 | 872701.0 | 2.001931e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 872701.0 | 2.001931e+09 | NaN | NaN |
| 2 | 25 | 9.171039e+14 | NaN | 62847009 | 913206.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-16 12:41:00 | 913206.0 | 2.000036e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 913206.0 | 2.000036e+09 | NaN | NaN |
| 3 | 26 | 9.700017e+14 | NaN | 62847025 | 913206.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-16 12:40:00 | 913206.0 | 2.000036e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 913206.0 | 2.000036e+09 | NaN | NaN |
| 4 | 34 | 8.902013e+14 | NaN | 62846217 | 864300.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-16 17:26:00 | 864300.0 | 2.001934e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 864300.0 | 2.001934e+09 | NaN | NaN |
| 5 | 38 | 9.700017e+14 | NaN | 62846050 | 913206.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-16 12:40:00 | 913206.0 | 2.000036e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 913206.0 | 2.000036e+09 | NaN | NaN |
| 6 | 43 | 9.379066e+14 | NaN | 62843032 | 967808.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 08:55:00 | 967600.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | 58.0 | NaN | 967600.0 | 2.000039e+09 | NaN | 7044.0 |
| 7 | 64 | 8.626063e+14 | NaN | 62844071 | 986103.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-15 23:00:00 | 985906.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 985906.0 | 2.000039e+09 | NaN | NaN |
| 8 | 70 | 9.676009e+14 | NaN | 62843529 | 937906.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-16 16:43:00 | 937906.0 | 2.000038e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 937906.0 | 2.000038e+09 | NaN | NaN |
| 9 | 73 | 8.621089e+14 | NaN | 62841184 | 864601.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 03:00:00 | 864601.0 | 2.001931e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 864601.0 | 2.001931e+09 | NaN | NaN |
Last rows
| df_index | index_train | length | car_number | destination_esr | adm | danger | gruz | loaded | operation_car | operation_date | operation_st_esr | operation_st_id | operation_train | receiver | rodvag | rod_train | sender | ssp_station_esr | ssp_station_id | tare_weight | weight_brutto | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 597649 | 4189854 | 8.528019e+14 | NaN | 62818117 | 986103.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 18:00:00 | 985906.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 985906.0 | 2.000039e+09 | NaN | NaN |
| 597650 | 4189860 | 8.527081e+14 | NaN | 62815881 | 980200.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-15 18:30:00 | 980003.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 980003.0 | 2.000039e+09 | NaN | NaN |
| 597651 | 4189869 | 8.621081e+14 | NaN | 62815279 | 967808.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 16:08:00 | 967600.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 967600.0 | 2.000039e+09 | NaN | NaN |
| 597652 | 4189876 | 8.621081e+14 | NaN | 62815170 | 967808.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 16:08:00 | 967600.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 967600.0 | 2.000039e+09 | NaN | NaN |
| 597653 | 4189885 | 8.302009e+14 | NaN | 62816137 | 862201.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-15 20:00:00 | 862201.0 | 2.001931e+09 | 4.0 | NaN | 60.0 | 10.0 | NaN | 862201.0 | 2.001931e+09 | NaN | 1756.0 |
| 597654 | 4189889 | 8.621081e+14 | NaN | 62814181 | 967808.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 16:08:00 | 967600.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 967600.0 | 2.000039e+09 | NaN | NaN |
| 597655 | 4189893 | 9.171039e+14 | NaN | 62814041 | 967808.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 15:13:00 | 967600.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | 10.0 | NaN | 967600.0 | 2.000039e+09 | NaN | 5602.0 |
| 597656 | 4189901 | 8.621081e+14 | NaN | 62813555 | 967808.0 | NaN | NaN | NaN | 1.0 | 3.0 | 2020-07-16 16:08:00 | 967600.0 | 2.000039e+09 | 4.0 | NaN | 60.0 | NaN | NaN | 967600.0 | 2.000039e+09 | NaN | NaN |
| 597657 | 4189906 | 8.623051e+14 | NaN | 62813316 | 872504.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-15 22:20:00 | 872504.0 | 2.001931e+09 | 72.0 | NaN | 60.0 | NaN | NaN | 870000.0 | 2.001931e+09 | NaN | NaN |
| 597658 | 4189912 | 8.302009e+14 | NaN | 62827910 | 862201.0 | NaN | NaN | NaN | 2.0 | 3.0 | 2020-07-15 20:00:00 | 862201.0 | 2.001931e+09 | 4.0 | NaN | 60.0 | 10.0 | NaN | 862201.0 | 2.001931e+09 | NaN | 1756.0 |